{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \"LOG->使用DirectML进行GPU模型训练\"\n",
    "> \"我的RX480又可以跑模型了\"\n",
    "\n",
    "- toc: true\n",
    "- badges: true\n",
    "- comments: true\n",
    "- hide: false\n",
    "- categories: [tensorflow,deeplearning,log]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 关于[DirectML](https://github.com/microsoft/DirectML)\n",
    "详情看README。主要是支持所有“支持DX12的显卡”进行硬件加速运算。这对于手上只有A卡的我无疑又是个好东西。  \n",
    "\n",
    "先前已经写过一篇关于[PlaidML](https://bjchacha.github.io/myblog/ai/2020/02/25/using-plaidml-keras-to-ai.html)的博文，也是可用A卡硬件加速。所以这里省略部分步骤，并最后跟PlaidML比较下效果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装DirectML\n",
    "目前[Tensorflow with DirectML](https://docs.microsoft.com/zh-cn/windows/win32/direct3d12/gpu-tensorflow-windows)仅支持最新版本的Windows 10和WSL。  \n",
    "\n",
    "安装非常简单，直接`pip`一下就好。这里例行使用`conda`创建虚拟环境来运行DirectML。\n",
    "\n",
    "> conda create -n directml python=3.7  \n",
    "> conda activate directml  \n",
    "> pip install tensorflow-directml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Note: DirectML只支持Tensorflow 1.15."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用DirectML训练Fashion-MNIST分类器\n",
    "这里跑跟之前试用PlaidML时一样的代码，方便对比。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": "WARNING:tensorflow:From C:\\Users\\bjcha\\Anaconda3\\envs\\directml\\lib\\site-packages\\tensorflow_core\\python\\ops\\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\nInstructions for updating:\nIf using Keras pass *_constraint arguments to layers.\nTrain on 60000 samples\nEpoch 1/10\n60000/60000 [==============================] - 14s 226us/sample - loss: 0.5786 - acc: 0.7869\nEpoch 2/10\n60000/60000 [==============================] - 13s 214us/sample - loss: 0.4072 - acc: 0.8522\nEpoch 3/10\n60000/60000 [==============================] - 13s 213us/sample - loss: 0.3625 - acc: 0.8681\nEpoch 4/10\n60000/60000 [==============================] - 13s 216us/sample - loss: 0.3340 - acc: 0.8788\nEpoch 5/10\n60000/60000 [==============================] - 13s 213us/sample - loss: 0.3179 - acc: 0.8842\nEpoch 6/10\n60000/60000 [==============================] - 13s 218us/sample - loss: 0.3003 - acc: 0.8903\nEpoch 7/10\n60000/60000 [==============================] - 13s 213us/sample - loss: 0.2891 - acc: 0.8933\nEpoch 8/10\n60000/60000 [==============================] - 13s 213us/sample - loss: 0.2768 - acc: 0.8980\nEpoch 9/10\n60000/60000 [==============================] - 13s 211us/sample - loss: 0.2704 - acc: 0.9011\nEpoch 10/10\n60000/60000 [==============================] - 13s 212us/sample - loss: 0.2586 - acc: 0.9046\n10000/10000 [==============================] - 1s 87us/sample - loss: 0.2360 - acc: 0.9130\ntraining time cost: 129.5 s, accuracy: 0.9130\n"
    }
   ],
   "source": [
    "#collapse\n",
    "import tensorflow as tf\n",
    "from time import time\n",
    "\n",
    "data = tf.keras.datasets.fashion_mnist\n",
    "(x_train, y_train), (x_test, y_test) = data.load_data()\n",
    "\n",
    "x_train = x_train.astype('float32').reshape(-1, 28, 28, 1) / 255.\n",
    "x_test = x_test.astype('float32').reshape(-1, 28, 28, 1) / 255.\n",
    "# print(x_train.shape)\n",
    "\n",
    "model = tf.keras.Sequential([\n",
    "    tf.keras.layers.Conv2D(\n",
    "        filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28, 28, 1)\n",
    "    ),\n",
    "    tf.keras.layers.MaxPool2D(pool_size=2),\n",
    "    tf.keras.layers.Dropout(0.3),\n",
    "    tf.keras.layers.Conv2D(\n",
    "        filters=32, kernel_size=2, padding='same', activation='relu'\n",
    "    ),\n",
    "    tf.keras.layers.MaxPool2D(pool_size=2),\n",
    "    tf.keras.layers.Dropout(0.3),\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(units=256, activation='relu'),\n",
    "    tf.keras.layers.Dropout(0.5),\n",
    "    tf.keras.layers.Dense(units=10, activation='softmax')])\n",
    "model.compile(\n",
    "    optimizer='adam', \n",
    "    loss=tf.keras.losses.sparse_categorical_crossentropy,\n",
    "    metrics=['accuracy'])\n",
    "\n",
    "train_start = time()\n",
    "model.fit(x_train, y_train, batch_size=64, epochs=10)\n",
    "train_end = time()\n",
    "_, accuracy = model.evaluate(x_test, y_test)\n",
    "print('training time cost: {0:.1f} s, accuracy: {1:.4f}'.format(train_end-train_start, accuracy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面可以看到DirectML可以正常使用A卡进行训练，训练时长为129秒，而PlaidML跑了124秒。  \n",
    "虽说DirectML比PLaidML慢，但胜在支持所有DX12的显卡以及完整的Tensorflow（PlaidML只支持Keras）。"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "version": "3.7.7-final"
  },
  "orig_nbformat": 2,
  "file_extension": ".py",
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "version": 3,
  "kernelspec": {
   "name": "directml",
   "display_name": "directml"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}